Decomposed Process Mining: The ILP Case
نویسندگان
چکیده
Over the last decade process mining techniques have matured and more and more organizations started to use process mining to analyze their operational processes. The current hype around “big data” illustrates the desire to analyze ever-growing data sets. Process mining starts from event logs—multisets of traces (sequences of events)—and for the widespread application of process mining it is vital to be able to handle “big event logs”. Some event logs are “big” because they contain many traces. Others are big in terms of different activities. Most of the more advanced process mining algorithms (both for process discovery and conformance checking) scale very badly in the number of activities. For these algorithms, it could help if we could split the big event log (containing many activities) into a collection of smaller event logs (which each contain fewer activities), run the algorithm on each of these smaller logs, and merge the results into a single result. This paper introduces a generic framework for doing exactly that, and makes this concrete by implementing algorithms for decomposed process discovery and decomposed conformance checking using Integer Linear Programming (ILP) based algorithms. ILP-based process mining techniques provide precise results and formal guarantees (e.g., perfect fitness), but are known to scale badly in the number of activities. A small case study shows that we can gain orders of magnitude in run-time. However, in some cases there is tradeoff between run-time and quality.
منابع مشابه
Decomposed Process Mining with DivideAndConquer
Many known process mining techniques scale badly in the number of activities in an event log. Examples of such techniques include the ILP Miner and the standard replay, which also uses ILP techniques. To alleviate the problems these techniques face, we can decompose a large problem (with many activities) into a number of small problems (with few activities). Expectation is, that the run times o...
متن کاملRelational Data Mining Applied to Virtual Engineering of Product Designs
Contemporary product design based on 3D CAD tools aims at improved efficiency using integrated engineering environments with access to databases of existing designs, associated documents and enterprise resource planning (ERP). One of the goals of the SEVENPRO project is to achieve design process improvements through the utilization of relational data mining (RDM), utilizing past designs and com...
متن کاملSupporting Case Acquisition and Labelling in the Cotext of Web Mining
Case acquisition and labelling are important bottlenecks for predictive data mining. In the web context, a cascade of supporting techniques can be used, from general ones such as user interfaces, through filtering based on keyword frequency, to web-specific techniques exploiting public search engines. We show how a synergistic application of multiple techniques can be helpful in obtaining and p...
متن کاملFuzzy ILP Classification of web reports after linguistic text mining
In this paper we study the problem of classification of textual web reports. We are specifically focused on situations in which structured information extracted from the reports is used for classification. We present an experimental classification system based on usage of third party linguistic analyzers, our previous work on web information extraction, and fuzzy inductive logic programming (fu...
متن کاملMulti-Relational Data Mining, Using UML for ILP
Although there is a growing need for multi-relational data mining solutions in KDD, the use of obvious candidates from the field of Inductive Logic Programming (ILP) has been limited. In our view this is mainly due to the variation in ILP engines, especially with respect to input specification, as well as the limited attention for relational database issues. In this paper we describe an approac...
متن کامل